GENERAL INFO ABOUT AREA AND TIMESTEP & RUN

# newarea can be 0 (no) or 1 (yes). If area is not new, it is assumed that the land outline shapefile and cropped rock outcrop is already available and doesn't need to be unzipped etc.
newarea <- 0
areaname <- "MDV"

# L8: either "Bt" or "L1"
L8downloadtype <- "Bt"


## time range parameters 
year <- c(2020:2013)
month <- c("01","02","03","04", "09", "10","11", "12")

# maximum time between satellite scenes 
timethres <- 0.6

Set Paths

Micro IR Seagate = E IRONWOLF / ELEMENTS = D

scriptpath <- "C:/Users/mleza/OneDrive/Documents/PhD/work_packages/auto_downscaling_30m/downscale_controlscripts/data_prep/"
scriptpath_organized <- "C:/Users/mleza/OneDrive/Documents/PhD/work_packages/auto_downscaling_30m/downscale_controlscripts/data_prep/downscaling_organized/"

maindir <- "D:/downscaling_after_talk/" # this must point to an existing directory, rest is generated in setup
main <- paste0(maindir, "data_download_preprocessing/")
L8datpath <- paste0(main, "L8/")
modispath <- paste0(main, "MODIS/")
tdpath <-paste0(main, "timediff/")
cddir <- paste0(maindir, "clean_data/")
figurepath <- "C:/Users/mleza/OneDrive/Documents/PhD/work_packages/auto_downscaling_30m/paper/paper_draft/figures/"


######## set path to DEM, AOI, land outline

dempath <- "E:/new_downscaling/tiles_westcoast/" # this must point to an existing directory with dem inside
aoipath <-  "E:/new_downscaling/aoi/" # this must point to an existing directory with "Levy_MDV_actually.shp"
aoip <- list.files(aoipath, pattern="actually.shp", full.names = T)
clpath <- "E:/new_downscaling/coastline/Coastline_high_res_polygon/"

Call Setup

  • load libraries
  • make time range list
  • create directories for Landsat and MODIS data
  • read AOI shape
  • define and source some functions
source(paste0(scriptpath_organized, "0a_setup.R"))
## Error in get(genname, envir = envir) : 
##   Objekt 'testthat_print' nicht gefunden
## OGR data source with driver: ESRI Shapefile 
## Source: "E:\new_downscaling\aoi\Levy_MDV_actually.shp", layer: "Levy_MDV_actually"
## with 1 features
## It has 1 fields
## Integer64 fields read as strings:  id 
## OGR data source with driver: ESRI Shapefile 
## Source: "E:\new_downscaling\coastline\Coastline_high_res_polygon\land_contours_MDV.shp", layer: "land_contours_MDV"
## with 14678 features
## It has 2 fields
#file.edit(paste0(scriptpath_organized, "0a_setup.R"))
Load template
# this is one raster with a complete coverage of the research area to use as a template 
template <- raster("E:/new_downscaling/clean_data/template_new.tif")

SETUP until here

1 DEM

DEM 8m REMA

file.edit(paste0(scriptpath_organized, "1_DEM.R"))
  • Use DEM tiles “17_34_8m”, “17_35_8m”,“18_35_8m”, “18_34_8m”, “19_34_8m”, “16_35_8m”, “17_36_8m”, “19_35_8m”
  • set everything below 100m to NA
  • make 8m mosaic, everything below -50m goes
  • 3x3 mean filter
  • crop to aoi
  • fill missing values with 200m RAMP
  • resample first to 8m and cover and mask, then resample to 30m to evade sharp cuts
  • calculate slope and aspect
  • make a 20000x20000 blockmask (20km²)
dem <- raster(paste0(dempath, "DEM_30m_", areaname,"_clean_aoi_filled_mask_new.tif"))
blockmask <- raster(paste0(dempath, "blockmask_aoi.tif"))
aspect_deg <- raster(paste0(dempath, "30m_aspectMDV.tif"))
slope_deg <- raster(paste0(dempath, "30m_slopeMDV.tif"))
# aspect_rad <- raster(paste0(dempath, "30m_radians_aspectMDV.tif"))
# slope_rad <- raster(paste0(dempath, "30m_radians_slopeMDV.tif"))


mapview(aoianta, alpha.regions=0.2, na.color="#00000000")+
  mapview(blockmask, na.color="#00000000")+  
  mapview(aspect_deg, na.color="#00000000")+
  mapview(slope_deg, na.color="#00000000")+
  mapview(dem, na.color="#00000000")

2 Landsat

getprocessLANDSAT() is a function(time_range)

login_USGS("MaiteLezama", "Eos300dmmmmlv")
## Login successfull. USGS ERS credentials have been saved for the current session.
login_earthdata(username="Mlezama", password="Eos300dm")
## Login successfull. NASA URS EarthData credentials have been saved for the current session.
y=1
m=1
ymid <- substring(time_range[[y]][[m]][[1]][[1]], 1, 7)
#time_range[[y]][[m]]

Landsat data are distributed in UTM - thus, the aoi is converted to UTM 57 south (WGS84). Footprints come in long lat. Per day, the footprints for “bt” data are taken from the query and their intersection with the research area is checked.

The Top of Atmosphere Brightness Temperature (toa_bt) product from Landsat 8 is used, which comes in 30m resolution, Kelvin, fill_value=“-9999” (to be rescaled by 0.1). The valid range is from 1500 to 3500.

Only those scenes are selected, where at least 30% of research area (total 20719560545m² or 20719.6km²) is covered by the scene (i.e. 6906.5km²) in the footprint. Only scenes that also show less than 20% land cloud cover are selected and written as querynew.RDS into the Landsat scene directory. Also “downloadScenes.csv” is written, which contains the date specifics on the downloaded scenes.

Time: Landsat comes in GMT and MODIS in UTC, and there is no time difference between Greenwich Mean Time and Coordinated Universal Time.

To order the scenes, a text file will be generated, with all the record ids. Those will be submitted for ordering on espa, where “Brightness Temperature - Thermal band TOA processing” and “Pixel QA” will be selected. This is a workaround, as ordering takes a lot of runtime, as a maximum of one dataset per 30 min can be ordered. Finally, all is downloaded via the package espa.tools with earthexplorer_download().

file.edit(paste0(scriptpath_organized, "2_1_selectLandsat.R"))
file.edit(paste0(scriptpath_organized, "2_2a_downloadLandsat.R"))
file.edit(paste0(scriptpath_organized, "2_2b_downloadLandsat.R"))
file.edit(paste0(scriptpath_organized, "2_3_processLandsat.R"))
file.edit(paste0(scriptpath_organized, "2_2c_place_scenes.R"))

Identify potentially good scenes:

source(paste0(scriptpath_organized, "2_1_selectLandsat.R"))

for(y in seq(year)){
  for(m in seq(month)){
    checkMyInternet()
    login_USGS("MaiteLezama", "Eos300dmmmmlv")
    try(login_earthdata(username="Mlezama", password="Eos300dm"))
    selectLANDSAT(time_range)
  }
}

“downloadScenes.csv” is where those scenes are, that passed the qualitycheck for clouds. For those scenes, I look for MODIS scenes that are located 20+/- around the L8 date. The time differences are documented in “timediff_df.csv”.

Actualized queries “MODquerymatched_msel.rds” and “L8querymatched.rds” are written out.

From the Landsat 8 query now,

Gather all record ids to submit online to espa

Download by espa ID

Only MODIS data from Aqua have a close match time.

So, what was lost exactly in comparison to the last run, where we had > 300 scenes?

…mostly scenes from Terra (MOD) were lost, that reduces the overall time difference in the dataset. Also, the median land cloud cover decreased.

Process Landsat

In this workflow run, I’m using already downloaded images. If it would be a new download, files would be unzipped. Get date, time, filename and land cloud cover from the MODIS-matched Landsat query and write that info into timediff_df. Identify which quality assessment band codes are regarded here as clouds (cloud shadow, clouds, medium and high confidence cloud and high confidence cirrus).

Help files for selection of MODIS and documentation of time differences are written.

getting LST from Landsat

Although there are patterned differences between Landsat bands 10 and 11, I’m sticking with the single band LST retrieval appraoch, becasue it was recommended to use band 10 rather than 11 due to stray light in the USGS calibration notices: “Additional work is underway to assess whether this correction is adequate for use with the split-window atmospheric correction technique. Until that work is complete, it is not recommended that Band 11 be used for the split-window technique.” USGS L8 calibration notices

d1011 <- raster(paste0(L8datpath, "difference_b10_b11.tif"))
mapview(d1011, map.types="Esri.WorldImagery", na.color="#00000000")

Brightness temperature TOA

btc_ex <- raster(paste0(L8datpath, "BTC_LC08_L1GT_057116_20190106_20190130_01_T2_bt_band10.tif"))
btc_ex_res <- raster(paste0(L8datpath, "BTC_res_LC08_L1GT_057116_20190106_20190130_01_T2_bt_band10.tif"))
LST_ex <- raster(paste0(L8datpath, "LST_LC08_L1GT_057116_20190106_20190130_01_T2_bt_band10.tif.tif"))

mapview(btc_ex, map.types="Esri.WorldImagery", na.color="#00000000")+
  mapview(btc_ex_res, na.color="#00000000")+
  mapview(LST_ex, na.color="#00000000")
diff_BTC_LST <- LST_ex - btc_ex_res
mapview(diff_BTC_LST, na.color="#00000000")
#file.edit(paste0(scriptpath_organized, "2_3_processLandsat.R"))
source(paste0(scriptpath_organized, "2_3_processLandsat.R"))

file_location <- "E:/new_downscaling/data_download_preprocessing/L8"

y=2
m=1
processLandsat(time_range, new_download=FALSE) # new_download=TRUE if all espa downloads are in one folder
workspace.size()
source(paste0(scriptpath_organized, "2a_cloudclean_Landsat_org.R"))
file.edit(paste0(scriptpath_organized, "2a_cloudclean_Landsat_org.R"))

Landsat and MODIS

MODIS LST product stems from channels 31 and 32, here is a comparison of the wavelengths covered by the thermal channels in both sensors:

3 MODIS

file.edit("3_MODIS.R")
# call full workflow for MODIS
downloaded_already = TRUE
new_download = FALSE


source("2_1_selectLandsat.R")
source("3_MODIS.R")
rm(msel)
rm(timediff_comp)
for(y in c(5:length(year))){
  for(m in c(8)){
#      for(m in seq(month)){

    login_USGS("MaiteLezama", "Eos300dmmmmlv")
    login_earthdata(username="Mlezama", password="Eos300dm")
    selectLANDSAT(time_range)
    getprocessMODIS(time_range)
  }
}

Output LST is called “proj_c_warp_LST_…” and Error: “proj_warp_Error”

According to MODIS_LST_products_UserGuide_C5.pdf, MOD11_L2 is masked with the MODIS Cloud Mask data product (MOD35_L2).

The LST retrieval in a MODIS swath is constrained to pixels that: (1). have nominal Level 1B radiance data in bands 31 and 32, (2). are on land or inland water, (3). are in clear-sky conditions at a confidence (defined in MOD35) of >=95% over land <= 2000m or >= 66% over land > 2000m, and at a confidence of >= 66% over lakes.

Taking a look at emissivity and LST error. Example scene from January 2019. Info on scene preparation here.

I used 0.94 as emissivity (Emissivity is defined as the amount of radiation emitted or absorbed by a body compared with that of a black body under identical conditions) for open soil and 0.97 for snow and ice based on the rock outcrop raster from Landsat (eta_res), see paper. The emissivity measured in MODIS product doesn’t really look reliable, it’s very patchy and there’s no detectable variation with land cover type, which should really be the case. Moreover, all values are around 0.99, which seems pretty high.

So it’s better to stick with the literature values.

The LST error lies within 0.2 to 1.8 K for this scene. I can use the error to exclude very unreliable pixels from training. It will be gathered here as well to add more info to the final data frame. Training can be done only on high reliability pixels then, as an experiment.

The Quality control band in MOD11_L2 needs to be converted from decimal to binary format (package luna). Check meaning of qa_bits here. Value of 1 indicates a good quality pixel for cloud state, cloud shadow, cloud flag and high cirrus detected.

##    QC Error
## 1 NaN    NA
## 2 NaN    NA
## 3 NaN    NA
## 4 NaN    NA
## 5 NaN    NA
## 6 NaN    NA

I’m asking myself, whether the algorithm for cloud, cirrus and quality determination just picks up on the open soil areas - they are cut out neatly here and the furthest 30min away scene from Landsat does not show any clouds - not sure…

Take a look at Landsat 8 error for this scene - info on bit readout can be found here

# get L8 scene
lst_L8 <- raster("E:/new_downscaling/data_download_preprocessing/L8/2019-01/LST/LC08_L1GT_224128_20190124_20190205_01_T2_bt_band10.tif")
lst_L8 <- mask(lst_L8, aoianta)

sceneloc <- list.files("E:/new_downscaling/data_download_preprocessing/L8/2019-01/get_data/LANDSAT/BT/", pattern="LC08_L1GT_224128_20190124_20190205_01_T2",
                       full.names=T)
scenefolder <- list.files(sceneloc, pattern="Bt$", full.names = T)
pqa <- raster(list.files(scenefolder, pattern="pixel_qa", full.names=T))

from <- c(6,1)
to   <- c(7,1)
reject <- c("11,10", "0")
qa_bits <- cbind(from, to, reject)

qc <- rast(pqa)
quality_mask <- modis_mask(qc, 8, qa_bits)
qm <- raster(quality_mask)

mapview(pqa, na.color="#00000000")+mapview(qm, na.color="#00000000")+mapview(lst_L8, na.color="#00000000")

It is not clear why so many well-looking values should be eliminated.. I guess, the Antarctica error issue comes into play and perhaps thus errors should not be taken into account here.

Now, MODIS LST is prepared by selecting the valid range and converting according to the factors specified in the product help page to LST in °C.

GDAL

Geolocation control points (GCPs) work better than Geolocation array, because those are 5x smaller (aggregated) in comparison to the real LST value array. The warping procedure is taylored to what the HEG-Tool does, i.e. Nearest Neighbor resampling, thin plate spline transformer based on available GCPs.

The target resolution was gathered from the HEG tool as well. Y pixel resolution was always about 0.0093 degrees, which is what is used here, X valied between 0.06 and sometimes even 0.02 - I chose 0.05 here, as the low value seemed to have been an outlier, most resolutions were about 0.04 to 0.06. Then I played around to get closest to the 1000x1000m resolution, which was reached with a tr of c(0.0441,0.0096).

The target extent is set to the WGS84 extent of the research area (158.5, -78.9, 164.7, -76.0).

# get LST subdataset
hdf4_dataset <- system.file(hdffilepath, package="gdalUtils")
sds <- get_subdatasets(hdffilepath)
sds[1]
## [1] "HDF4_EOS:EOS_SWATH:E:/new_downscaling/data_download_preprocessing/MODIS/2019-01/hdfs/MOD11_L2.A2019024.1320.006.2019030111707.hdf:MOD_Swath_LST:LST"
# use gdalwarp
gdalwarp(sds[1], 
         dstfile = paste0(hdfdir,"LST_warp_", tools::file_path_sans_ext(basename(hdffilepath)), ".tif"),
           tps=T, # Force use of thin plate spline transformer based on available GCPs.
           #rpc=T, # Force use of Geolocation Arrays or RPC
           verbose=TRUE,
           s_srs = wgsproj,
           t_srs = wgsproj,
           overwrite = T,
           tr = c(0.0441,0.0096),
           te = c(158.5, -78.9, 164.7, -76.0), 
           r="near")
## NULL
LSTwarped <- raster(paste0(hdfdir,"LST_warp_", tools::file_path_sans_ext(basename(hdffilepath)), ".tif"))
mapview(LSTwarped)
# project to antaproj
Mprojected <- projectRaster(LSTwarped, crs=antaproj)
Mprojected
## class      : RasterLayer 
## dimensions : 359, 251, 90109  (nrow, ncol, ncell)
## resolution : 1000, 1000  (x, y)
## extent     : 314132.8, 565132.8, -1479585, -1120585  (xmin, xmax, ymin, ymax)
## crs        : +proj=stere +lat_0=-90 +lat_ts=-71 +lon_0=0 +x_0=0 +y_0=0 +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs 
## source     : memory
## names      : LST_warp_MOD11_L2.A2019024.1320.006.2019030111707 
## values     : 11657, 13555  (min, max)
# maks by aoi
Mmasked <- mask(Mprojected, aoianta)

# convert to °C
# Valid Range = 7500-65535
Mmasked[Mmasked == 0 ] <- NA
Mmasked[Mmasked < 7500 & Mmasked > 65535] <- NA

#Mmasked[[i]][Mmasked[[i]] < 7500 & Mmasked[[1]] > 65535] <- NA

# scale factor = 0.02
MLST <- Mmasked*0.02

# convert to degree C
MLSTC <- MLST - 273.15
mapview(MLSTC)
writeRaster(MLSTC, "D:/example_M_LST_C.tif", overwrite=T)

comparing LST from MODIS and from L8

Landsat shows slightly lower values than MODIS at 1km resolution.

ggplot(extr, aes(x=L8, y=MODIS))+
  geom_point()+geom_abline(intercept = 0, slope=1)+
  geom_smooth(method=lm)+theme_classic()+
  ggtitle("LST from L8 (resampled to 1km) and MODIS")

Available scenes

source("2_2a_check_for_differences.R")
timediff_scene_plot_TA

timediff_scene_plot_A

Checking for provenance of gaps in available scenes

First, look at 12h apart from minimum timedifference for all MOD / L8 scene combinations.

## 
##  TRUE 
## 73804

Now, look at differences < 3h:

Local time NZDT

Local time of all vs. selected scenes, y axis should be understood as “HH.MM”.

## [1] "11.40"
## [1] "23.55"
## [1] "14.10"
## [1] "16.20"
## [1] "14.30"
## [1] "16.20"

solar position coverage for the selected scenes throughout the year

Azimuth and Altitude

Azimuth and Altitude image source

Now for each pixel location

Testing solar positions throughout a day and the year

One day in January (“2019-01-02 00:00:00 UTC” “2019-01-02 01:00:00 UTC” …. “2019-01-02 23:00:00 UTC”)

During the year at the same time of day (“2019-01-01 14:00:00 UTC”, “2019-01-02 14:00:00 UTC”, … “2019-12-31 14:00:00 UTC” )

–> there is not enough azimuth change throughout a full year at the same time of day as to justify that the model was able to learn all sun positions from this, but we can argue that there are many more or less shaded pixels available in the research area, so that we can still expect the model to be able to pick up all it needs.

Open questions:

  • What does -1 in Landsat Land Cloud Cover mean? Missing value I assume
  • Is it problematic that we only get T2 datasets?

#to do for final run:

\(\checkmark\) band 10/11 reference (literature)

\(\checkmark\) MODIS: take a look at Error_LST: 0.2 to 1.8 - looks ok, not sure whether it helps to include that information, because L8 LST is also biased, but I don’t get an error measurement there. Only thing could be to incorporate it in the extraction table and then see if training with only low error pixels will work better?

\(\checkmark\) use EMISSIVITY from modis product? Won’t… doesn’t look promising because patchy & resolution issue - assuming emissivity for the two contrasting land cover types at 30m resolution seems to be the more sensible option

\(\checkmark\) use Aqua and Terra for training? Only Aqua, below 15min time difference available here

  • training test areas

  • SWIR raus

  • hs & ia predict function (neue MODIS scene, MODELL + hs / is Berechnung -> prediction)

  • hypercube vs. mein sampling

  • apply AOA to final raster